2019-04-08

Objective

To analyze the ICU Admissions dataset to develop a predictive model for Vital Status.

Outline

Introduction

Introduction to Dataset

  • 200 Subjects
  • Part of a larger study
  • Main goal to predict the probability of survival to hospital discharge of patients with risk factors associated with ICU Mortality

Author(s): Stanley Lemeshow, Daniel Teres, Jill Spitz Avrunin and Harris Pastides Source: Journal of the American Statistical Association, Vol. 83, No. 402 (Jun., 1988), pp. 348- 356

Data collection Method

  • Collected by nurses from patients from patients admitted to the adult general ICU at Baystate Medical Center in Springfield, Massachusetts between Feb 1 – Aug 15, 1983.
  • Coronary care, cardiac surgery, burn patients, and patients under 14 were excluded.
  • Information collected at five different times: Admission, 24 hours, 48 hours, ICU discharge, hospital discharge.

Meet Eleanor

Summary of Dataset

Variables:

Demographics

Age Pyramid

n=200

Distribution of Race

n=200

Race by Status

n=200

Percentage Survival by Age

n=200

Shapiro

Introduction

The Shapiro-Wilks Test is a test of normality.

Ho= Normally distributed
Ha= Not normally distributed

Systolic

Reject the null hypothesis and conclude that systolic bp is not normally distributed.

## 
##  Shapiro-Wilk normality test
## 
## data:  Systolic
## W = 0.98369, p-value = 0.0204

Systolic

n=200

## [1] 200 179

Age

Reject the null hypothesis and conclude that age is not normally distributed.

## 
##  Shapiro-Wilk normality test
## 
## data:  Age
## W = 0.92836, p-value = 2.507e-08

Age

n=200

## [1] 23 97

Heart Rate

Reject the null hypothesis and conclude that heart rate is not normally distributed.

## 
##  Shapiro-Wilk normality test
## 
## data:  HeartRate
## W = 0.98598, p-value = 0.04478

Heart Rate

n=200

## [1] 125  48

Conclusion

None of the continuous variables were normally distributed.

Wilcoxon

Wilcoxon Introduction

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ.

Ho: the distributions are the same

Ha: the distributions are not the same

Age vs CPR

##  No Yes 
##  63  60
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Age by CPR
## W = 1273, p-value = 0.7775
## alternative hypothesis: true location shift is not equal to 0

Null hypothesis cannot be rejected and we therefore concluded that the distribution of Age is the same for those that had and did not have CPR.

Age vs Status

##  Died Lived 
##    68    61
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Age by Status
## W = 4031.5, p-value = 0.01112
## alternative hypothesis: true location shift is not equal to 0

Null hypothesis can be rejected and we therefore concluded that the distribution of Age and those that lived and died is not the same.

Age vs Cancer

##  No Yes 
##  63  62
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Age by Cancer
## W = 1937, p-value = 0.5782
## alternative hypothesis: true location shift is not equal to 0

Null hypothesis cannot be rejected and we therefore concluded that the distribution of Age and those that had and did not have Cancer is the same.

HeartRate vs Infection

##    No   Yes 
##  88.5 106.0
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  HeartRate by Infection
## W = 3056, p-value = 6.952e-06
## alternative hypothesis: true location shift is not equal to 0

Null hypothesis can be rejected and we therefore can concluded that the distribution of Heartrate and those that had an infection vs did not have in infection is not the same.

Wilcoxon Conclusion

In conclusion, the distributions for Heart Rate for those with an infection or not as well as Age for those who lived and died are not the same. The distribution in age in those who had cancer or did not, and age for those who did and did not have CPR were the same.

Chi-Square Independence Test

Introduction

Chi-Square tests the probability of independence of categorical variables.

Ho: No association between the two variables

Ha: Association

Sex

Sex Status Total
Died Lived
Female 16
40 %
60
37.5 %
76
38 %
Male 24
60 %
100
62.5 %
124
62 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.012 · df=1 · φ=0.021 · p=0.913

Race

Race Status Total
Died Lived
Black 1
2.5 %
14
8.8 %
15
7.5 %
Other 2
5 %
8
5 %
10
5 %
White 37
92.5 %
138
86.2 %
175
87.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=1.810 · df=2 · Cramer's V=0.095 · Fisher's p=0.494

Service

Service Status Total
Died Lived
Medical 26
65 %
67
41.9 %
93
46.5 %
Surgical 14
35 %
93
58.1 %
107
53.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=5.981 · df=1 · φ=0.185 · p=0.014

Cancer

Cancer Status Total
Died Lived
No 36
90 %
144
90 %
180
90 %
Yes 4
10 %
16
10 %
20
10 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher's p=1.000

Renal

Renal Status Total
Died Lived
No 32
80 %
149
93.1 %
181
90.5 %
Yes 8
20 %
11
6.9 %
19
9.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=4.976 · df=1 · φ=0.179 · Fisher's p=0.029

Infection

Infection Status Total
Died Lived
No 16
40 %
100
62.5 %
116
58 %
Yes 24
60 %
60
37.5 %
84
42 %
Total 40
100 %
160
100 %
200
100 %
χ2=5.759 · df=1 · φ=0.182 · p=0.016

Status

CPR Status Total
Died Lived
No 33
82.5 %
154
96.2 %
187
93.5 %
Yes 7
17.5 %
6
3.8 %
13
6.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=7.821 · df=1 · φ=0.223 · Fisher's p=0.005

Previous

Previous Status Total
Died Lived
No 33
82.5 %
137
85.6 %
170
85 %
Yes 7
17.5 %
23
14.4 %
30
15 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.061 · df=1 · φ=0.035 · Fisher's p=0.624

Type

Type Status Total
Died Lived
Elective 2
5 %
51
31.9 %
53
26.5 %
Emergency 38
95 %
109
68.1 %
147
73.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=10.527 · df=1 · φ=0.244 · p=0.001

Fracture

Fracture Status Total
Died Lived
No 37
92.5 %
148
92.5 %
185
92.5 %
Yes 3
7.5 %
12
7.5 %
15
7.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher's p=1.000

PO2

PO2 Status Total
Died Lived
No 35
87.5 %
149
93.1 %
184
92 %
Yes 5
12.5 %
11
6.9 %
16
8 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.718 · df=1 · φ=0.083 · Fisher's p=0.324

PH

PH Status Total
Died Lived
No 36
90 %
151
94.4 %
187
93.5 %
Yes 4
10 %
9
5.6 %
13
6.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.416 · df=1 · φ=0.071 · Fisher's p=0.297

PCO2

PCO2 Status Total
Died Lived
No 36
90 %
144
90 %
180
90 %
Yes 4
10 %
16
10 %
20
10 %
Total 40
100 %
160
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher's p=1.000

Bicarbonate

Bicarbonate Status Total
Died Lived
No 35
87.5 %
150
93.8 %
185
92.5 %
Yes 5
12.5 %
10
6.2 %
15
7.5 %
Total 40
100 %
160
100 %
200
100 %
χ2=1.014 · df=1 · φ=0.095 · Fisher's p=0.187

Creatinine

Creatinine Status Total
Died Lived
No 35
87.5 %
155
96.9 %
190
95 %
Yes 5
12.5 %
5
3.1 %
10
5 %
Total 40
100 %
160
100 %
200
100 %
χ2=4.112 · df=1 · φ=0.172 · Fisher's p=0.029

Consciousness

Consciousness Status Total
Died Lived
Conscious 8
20 %
2
1.2 %
10
5 %
Unconscious 32
80 %
158
98.8 %
190
95 %
Total 40
100 %
160
100 %
200
100 %
χ2=19.901 · df=1 · φ=0.344 · Fisher's p=0.000

Chi-Square Conclusion

In conclusion, service, renal, infection, CPR, type, creatinine and conciousness each have an association to Status.

Correlation

Correlation Introduction

Correlational analyses are used to look at the relationships between two variables to determine if the two variables are related to each other.

Since age, heart rate and systolic blood pressure are all not normally distributed, the spearman's rank correlation coefficient was used to test correlation between these variables.

Spearman's Rank Correlation Coefficient

n=200

Systolic and Age

n=200

Systolic and Heart Rate

n=200

Age and Heart Rate

n=200

Correlation Matrix

n=200

Correlation Conclusion

From the prior analysis, we can conclude that the variables, systolic, heart rate and age are all not correlated to each other and therefore cannot influence each other when used as predictor variables. Two categorical variables Type and Service are correlated with a value of -0.54, they could influence eachother if used as predictor variables.

Regression

Regression Introduction

A logistic regression model is developed in this section to find which variables in the dataset predict status.

Ho: None of the independent variables in the data set predict hospital mortality of ICU patients, based on information available at the time of ICU admissions.

Ha: Some of the independent variables in the data set do predict hospital mortality of ICU patients, based on information available at the time of ICU admissions.

Preliminary Model

n=200

Analysis of Preliminary Model

Therefore, we can state the predictor variables of CancerYes, Age, TypeEmergency, and ConsciousnessConscious have a statistically significant relation with Vital Status.

Based on the odds ratios, Level of Consciousness at admission (no coma) is the greatest predictor (18.95) of Vital Status.

BIC MODEL: Bayesian Information Criterion

In statistics, the Bayesian information criterion (BIC) for model selection among a finite set of models. It is based, in part, on the likelihood function, and it is closely related to Akaike information criterion (AIC).

Analysis

We then verified our findings by using a Step-Wise Forward-Backward Bayesian Regression Model.

The BIC model found the following variables to be statistically significant the independent variables identified as significant

  1. ConciousnessUnconscious
    
  2. TypeEmergency
    
  3. CancerYes
    
  4. Age

We achieved a lower AIC score of 149.1

Splitting Dataset

Splitting the dataset into training sample (70%) and testing sample (30%).

The training sample size is 140 and the testing sample size is 60.

Prediction of our Training Model

The intent of our training model is to predict the mortality odds of a patient being diagnosed with a Vital Status = 1 (died) based on the informiation available at the time of ICU admission, specifically using the best predictor variables of CancerYes, Age, TypeEmergency, ConsciousnessConscious, & Systolic.

Logistic Regression of Training

  Status
Predictors Odds Ratios CI p
(Intercept) 7.99 0.05 – 1269.19 0.422
Yes 0.07 0.01 – 0.84 0.037
Age 0.96 0.94 – 0.99 0.010
Emergency 0.02 0.00 – 0.47 0.014
Unconscious 21.24 2.27 – 199.17 0.007
Systolic 1.02 1.00 – 1.04 0.057
Observations 140
Cox & Snell's R2 / Nagelkerke's R2 0.274 / 0.424

Analysis of Training Model

For our Training Model, the following variables as statistically significant at an alpha level of 0.05 in regard to predicting hospital mortality of ICU patients. The variables are listed in order of greatest signifiance to least based on p-values:

1.  Consciousness(Unconscious): smallest p-value of 0.007
3.  Age: 0.010
3.  Systolic: 0.01817
4.  Type(Emergency): 0.014
5.  Cancer(Yes): 0.037

Analysis of Training Model

Null Deviance > Residual Deviance?

   YES, decreases 55 points, indicating a good model.

AIC Value = 101.18

  The lowest AIC so far, a lower AIC value indicated an improvement
  in the model

Confidence Intervals:

 None of the confidence intervals include 1, which indicates the 
 variables are statistically significant. 

Is the model an improvement?

Our training model is an improvement upon our BIC Logistic Regression Model as the significance of each variable based upon p-values aligns with the predictability of the odds ratios. In addition, our AIC value is the lowest at a value of 101.18, with the inclusion of Systolic.

Use Training Model to predict Testing Model

Ho: The predicted values in the training model cannot be used to predict Vital Status and/or hospital mortality of ICU patients, based on information available at the time of ICU admissions; the predictions of the testing model are not statistically significant.

Ha: The predicted values in the training model can be used to predict Vital Status and/or hospital mortality of ICU patients, based on information available at the time of ICU admissions; the predictions of the testing model are not statistically significant.

Correlation

The relevant null hypothesis is Ho: the predicted values of Status and the actual values are not correlated.

## 
##  Pearson's product-moment correlation
## 
## data:  x$actual and x$predicted
## t = 2.1478, df = 58, p-value = 0.03592
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01880832 0.49148596
## sample estimates:
##       cor 
## 0.2714367

Plot the predicted vs the actual values.

Regression of actual on predicted

Conclusion

Limitations

  • Sample Size (Original sample had 2700 people)
    • Our sample size, n = 200 and variables k = 19…not ideal for BIC
  • Study results not suggested for multiple ICU admission scenarios
  • High error rate in data collection

Conclusion

Model to predict vital status included (in order of high to low statistical significance:

  1. Consciousness(Deep Stupor, Coma)
  2. Age
  3. Systolic
  4. Type(Emergency)
  5. Cancer(Yes)
  • Accuracy of training model to predict mortality = 15.9%

  • Based on our analysis, we DO NOT reject the null hypothesis.

Remember Eleanor? What happened to her?

Prognosis